The data are geographic and atospheric measures on a very coarse 24 by 24 grid covering Central America. The data was obtained from the NASA Langley Research Center - Atmospheric Science Data Center.
The data was stored in a data cube tbl from the dplyr package (tbl_cube). This type of data storage uses two arguments: dimensions and measures. The dimensions is a named list of vectors while measures is a named list of arrays.
Dimensions - latitude, longitude, month, and year.
Measures - Cloud coverage, ozone, surface temperature, temperature, and pressure.
summary(nasa)
## Length Class Mode
## mets 7 -none- list
## dims 4 -none- list
These two terms are used to specify precise locations of features on the surface of the Earth. In our case, 576 points were picked evenly spread on a 24 by 24 grid.
Each measure of data were measured every month for 6 years on all 576 locations.
Cloud coverage refers to the fraction of the sky obscured by clouds from a particular location. The higher the number, the greatter the coverage. This piece of data was measured at three different levels: low, medium, and high per location.
Ozone protects us from the sun’s UV rays and is measured in Dobson units (DU). Each Dobson unit refers to 0.01 millimeters of thickness. For reference, the ozone in the atmosphere is around 300 Dobsons.
Pressure refers to the atmospheric pressure measured in millibars. For referece, the average pressure at sea level is around 1013 millibars.
Temperature is the degree or intensity of heat in the atmosphere.The temperature is measured in Kelvin rather than Celsius or Fahrenheit.
Surface temperature is the temperature measured on the surface of Earth. This is also measured in Kelvin.
dfnasa <- as.data.frame(nasa)
year1 = slice(dfnasa,1,577,1153,1729,2305,2881,3457,4033,4609,5185,5761,6337)
Converted the nasa data into a data frame. Using slice(), I split the data into each year from 1995 to 2000.
Using the highcharter package, I created multiple line graphs to form basic relationships for each year.
highchart() %>%
hc_xAxis(categories = year1$month) %>%
hc_add_series(name = "High Cloud", data = year1$cloudhigh) %>%
hc_add_series(name = "Low Cloud", data = year1$cloudlow) %>%
hc_add_series(name = "Medium Cloud", data = year1$cloudmid) %>%
hc_add_theme(hc_theme_538())
Compares all of the cloud coverage (low, medium, high) into a line graph.
highchart() %>%
hc_xAxis(categories = year1$month) %>%
hc_add_series(name = "Temperature", data = year1$temperature) %>%
hc_add_series(name = "Surface Temperature", data = year1$surftemp) %>%
hc_add_series(name = "Ozone", data = year1$ozone) %>%
hc_add_theme(hc_theme_economist())
Compares the temperature, surface temperature and ozone layer using a line graph.
highchart() %>%
hc_xAxis(categories = year1$month) %>%
hc_add_series(name = "Pressure", data = year1$pressure) %>%
hc_add_theme(hc_theme_ffx())
Used a line graph to show any general trends of the pressure data.
#Mean
#Median
#Relationships etc
#Stuff to point out
dflat = dfnasa$lat[1:576]
dflong = dfnasa$long[1:576]
qpal <- colorFactor(c("blue","royalblue","lightskyblue", "yellow","orange","red"), domain = dfnasa$temperature[1:576])
leaflet(slice(dfnasa,1:576)) %>%
addTiles() %>%
setView(lng = -90, lat = 10, zoom = 3) %>%
addCircleMarkers(lat = dflat, lng = dflong,color = ~qpal(temperature),stroke = FALSE, fillOpacity= 0.3,
popup = paste("Temperature:", dfnasa$temperature))
Using the leaflet package to create a map, I marked the temperatures of each region in the 24 by 24 grid. The red shows hotter temperatures while the blue shows cooler temperatures.
dflat = dfnasa$lat[1:576]
dflong = dfnasa$long[1:576]
qpal <- colorFactor(c("blue","yellow2","orange","firebrick1"), domain = dfnasa$pressure[1:576])
leaflet(slice(dfnasa,1:576)) %>%
addTiles() %>%
setView(lng = -90, lat = 10, zoom = 3) %>%
addCircleMarkers(lat = dflat, lng = dflong,color = ~qpal(pressure),stroke = FALSE, fillOpacity= 0.5,
popup = paste("Pressure:",dfnasa$pressure))
This map shows the distribution of air pressure across the 24 by 24 grid.
The purpose of the Pearson’s correlation was to make clearer relationships with the variables. As shown below, the positive, blue numbers show a positive correlation while the negative, red numbers show a negative correlation. The closer the number is to -1 or 1, the stronger the correlation.
selected_var <- combine %>%
select(cloudhigh,cloudmid,cloudlow,ozone,pressure,temperature,surftemp)
corr_nasa <- cor(selected_var)
corrplot(corr_nasa,method = "number")
Correlation with selected variables (cloud coverage, ozone, pressure, temperatures).
As a result of the correlation table: Low cloud coverage, temperature, and surface temperature were positively correlated while high cloud coverage, middle cloud coverage, and ozone were positively correlated. AIr pressure did not seem to correlate with any of the other variables.
I created several linear regression models using temperature as the dependent variable.
temp_lowc <- lm(temperature ~ cloudlow,data = combine)
temp_lowc
##
## Call:
## lm(formula = temperature ~ cloudlow, data = combine)
##
## Coefficients:
## (Intercept) cloudlow
## 277.7133 0.6569
lowcg <- ggplot(combine,aes(x=cloudlow,y=temperature))+geom_point()+xlab("Low Cloud Coverage")+ylab("Temperature")+geom_abline(intercept=277.7133,slope=0.6569,col="indianred3")
lowcg
temp_midc <- lm(temperature ~ cloudmid,data = combine)
temp_midc
##
## Call:
## lm(formula = temperature ~ cloudmid, data = combine)
##
## Coefficients:
## (Intercept) cloudmid
## 313.376 -1.168
midcg <- ggplot(combine,aes(x=cloudmid,y=temperature))+geom_point()+xlab("Middle Cloud Coverage")+ylab("Temperature")+geom_abline(intercept=313.376,slope=-1.168,col="indianred3")
midcg
temp_highc <- lm(temperature ~ cloudhigh,data = combine)
temp_highc
##
## Call:
## lm(formula = temperature ~ cloudhigh, data = combine)
##
## Coefficients:
## (Intercept) cloudhigh
## 298.8933 -0.8519
highcg <- ggplot(combine,aes(x=cloudhigh,y=temperature))+geom_point()+xlab("High Cloud Coverage")+ylab("Temperature")+geom_abline(intercept=298.8933,slope=-0.8519,col="indianred3")
highcg
temp_ozone <- lm(temperature ~ ozone,data = combine)
temp_ozone
##
## Call:
## lm(formula = temperature ~ ozone, data = combine)
##
## Coefficients:
## (Intercept) ozone
## 337.8292 -0.1526
ozoneg <- ggplot(combine,aes(x=ozone,y=temperature))+geom_point()+xlab("Ozone Level")+ylab("Temperature")+geom_abline(intercept=337.8292,slope=-0.1526,col="indianred3")
ozoneg
temp_surftemp <- lm(temperature ~ surftemp,data = combine)
temp_surftemp
##
## Call:
## lm(formula = temperature ~ surftemp, data = combine)
##
## Coefficients:
## (Intercept) surftemp
## 86.0301 0.7002
surfg <- ggplot(combine,aes(x=surftemp,y=temperature))+geom_point()+xlab("Surface Temperature")+ylab("Temperature")+geom_abline(intercept=86.0301,slope=0.7002,col="indianred3")
surfg
temp_pres <- lm(temperature ~ pressure,data = combine)
temp_pres
##
## Call:
## lm(formula = temperature ~ pressure, data = combine)
##
## Coefficients:
## (Intercept) pressure
## 259.84484 0.03408
presg <- ggplot(combine,aes(x=pressure,y=temperature))+geom_point()+xlab("Atmospheric Pressure")+ylab("Temperature")+geom_abline(intercept=259.84484,slope=0.03408,col="indianred3")
presg
Used to combine all graphs into one figure.
figure <- ggarrange(lowcg,midcg,highcg,ozoneg,surfg,presg ,ncol = 3,nrow=2)
figure
From the linear regressions, pressure was the only variable that did not correlate with temperature. Therefore, the multiple linear regression model will not use that variable for predictions.
model <- lm(temperature ~ cloudlow+cloudmid+cloudhigh+ozone+surftemp,data=combine)
summary(model)
##
## Call:
## lm(formula = temperature ~ cloudlow + cloudmid + cloudhigh +
## ozone + surftemp, data = combine)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.7001 -1.7232 -0.0064 1.7982 4.8737
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.43522 22.89516 0.893 0.375
## cloudlow -0.65061 0.10505 -6.194 4.27e-08 ***
## cloudmid 0.16998 0.10875 1.563 0.123
## cloudhigh -0.43951 0.08269 -5.315 1.35e-06 ***
## ozone 0.01669 0.02003 0.833 0.408
## surftemp 0.95383 0.06855 13.915 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.427 on 66 degrees of freedom
## Multiple R-squared: 0.9261, Adjusted R-squared: 0.9205
## F-statistic: 165.3 on 5 and 66 DF, p-value: < 2.2e-16
Chose 50 random data points from the NASA data set (some listed below):
temp_pred <- sample_n(dfnasa,50)
head(temp_pred)
## lat long month year cloudhigh cloudlow cloudmid ozone
## 1 -11.217391 -71.22609 3 1998 31.0 6.5 27.0 244
## 2 -16.208696 -71.22609 5 1996 8.0 10.0 24.5 250
## 3 33.704348 -106.28696 5 1999 9.5 15.5 14.0 326
## 4 8.747826 -106.28696 1 2000 2.0 17.0 7.0 260
## 5 3.756522 -96.26957 4 1998 46.5 13.5 23.5 244
## 6 13.739130 -93.76522 4 1997 5.5 23.0 8.5 254
## pressure surftemp temperature
## 1 1000 294.6 300.5
## 2 680 288.3 285.8
## 3 925 297.4 291.7
## 4 1000 297.8 299.2
## 5 1000 301.4 301.4
## 6 1000 302.8 302.3
Data frame of 50 random rows from the NASA data set.
model_usage <- temp_pred %>% select(cloudhigh,cloudlow,cloudmid,ozone,surftemp)
real_temp <- temp_pred %>% select(temperature)
head(model_usage)
## cloudhigh cloudlow cloudmid ozone surftemp
## 1 31.0 6.5 27.0 244 294.6
## 2 8.0 10.0 24.5 250 288.3
## 3 9.5 15.5 14.0 326 297.4
## 4 2.0 17.0 7.0 260 297.8
## 5 46.5 13.5 23.5 244 301.4
## 6 5.5 23.0 8.5 254 302.8
The model_usage variable was used to find the prediction while storing the actual temperature in real_temp.
model_predictions <- model_usage %>% add_predictions(model)
head(model_predictions)
## cloudhigh cloudlow cloudmid ozone surftemp pred
## 1 31.0 6.5 27.0 244 294.6 292.2409
## 2 8.0 10.0 24.5 250 288.3 293.7384
## 3 9.5 15.5 14.0 326 297.4 297.6639
## 4 2.0 17.0 7.0 260 297.8 298.0748
## 5 46.5 13.5 23.5 244 301.4 286.7654
## 6 5.5 23.0 8.5 254 302.8 297.5569
actual_preddf <- data.frame(cbind(real_temp, model_predictions$pred))
colnames(actual_preddf) = c("real","prediction")
ggplotly(ggplot(actual_preddf)+geom_point(aes(x=real,y=prediction))+
geom_abline(intercept=0,slope=1,col="darkturquoise",size=1)+
xlab("Real Temperature")+ylab("Predicted Temperature"))